statistic, principal component analysis and particle swarm optimization

نویسنده

  • Harun Uğuz
چکیده

Today, the number of text documents in digital form is progressively increasing and text categorization becomes the key technology of dealing with organizing text data. A major problem of text categorization is a huge-scale number of features. Most of those are useless, irrelevant or redundant for text categorization. Therefore, these features can decrease the classification performance. In order to eliminate this deficiency, feature selection is often used in text categorization for the purpose of reducing the dimensionality of the feature space and improving the performance of text categorization. In this study, in order to improve the performance of text categorization, a hybrid approach is suggested based on x 2 statistic, particle swarm optimization (PSO) and principal component analysis (PCA). In this context, initially, each term within the document is ranked depending on their importance for the classification using x 2 statistic method and, particle swarm optimization (PSO) and principal component analysis (PCA) feature selection and feature extraction methods are applied separately on the terms of which importance are ranked in decreasing order and dimension reduction is carried out. In this way, during the text categorization, less importance terms are ignored, feature selection and feature extraction methods are applied on the highest importance terms, and cost of computational time and complexity to be occurred in the course of the application are reduced. To evaluate the effectiveness of purposed model, experiments were conducted using K-nearest neighbor (KNN) and C4.5 decision tree algorithm on Reuters-21578 and Classic3 datasets collection for text categorization. The experimental evaluation showed that the proposed model was effective for text categorization.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Intelligent Condition Diagnosis Method Based on Adaptive Statistic Test Filter and Diagnostic Bayesian Network

A new fault diagnosis method for rotating machinery based on adaptive statistic test filter (ASTF) and Diagnostic Bayesian Network (DBN) is presented in this paper. ASTF is proposed to obtain weak fault features under background noise, ASTF is based on statistic hypothesis testing in the frequency domain to evaluate similarity between reference signal (noise signal) and original signal, and rem...

متن کامل

Predicting the Young\'s Modulus and Uniaxial Compressive Strength of a typical limestone using the Principal Component Regression and Particle Swarm Optimization

In geotechnical engineering, rock mechanics and engineering geology, depending on the project design, uniaxial strength and static Youngchr('39')s modulus of rocks are of vital importance. The direct determination of the aforementioned parameters in the laboratory, however, requires intact and high-quality cores and preparation of their specimens have some limitations. Moreover, performing thes...

متن کامل

An efficient approach for availability analysis through fuzzy differential equations and particle swarm optimization

This article formulates a new technique for behavior analysis of systems through fuzzy Kolmogorov's differential equations and Particle Swarm Optimization. For handling the uncertainty in data, differential equations have been formulated by Markov modeling of system in fuzzy environment. First solution of these derived fuzzy Kolmogorov's differential equations has been found by Runge-Kutta four...

متن کامل

Analog Circuit Soft Fault Diagnosis based on PCA and PSO-SVM

Regarding to the complexity and diversity of analog circuit fault, a principal component analysis(PCA) and particle swarm optimization(PSO) support vector machine(SVM) analog circuit fault diagnosis method is proposed. It uses principal component analysis and data normalization as preprocessing, then reduced dimension fault feature is putted into support vector machine to diagnosis, and particl...

متن کامل

Klasifikasi Data Cardiotocography Dengan Integrasi Metode Neural Network Dan Particle Swarm Optimization

Backpropagation (BP) adalah sebuah metode yang digunakan dalam training Neural Network (NN) untuk menentukan parameter bobot yang sesuai. Proses penentuan parameter bobot dengan menggunakan metode backpropagation sangat dipengaruhi oleh pemilihan nilai learning rate (LR)-nya. Penggunaan nilai learning rate yang kurang optimal berdampak pada waktu komputasi yang lama atau akurasi klasifikasi yan...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012